Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP properties return values #1808

Closed
wants to merge 16 commits into from

Conversation

Manishearth
Copy link
Member

@Manishearth Manishearth commented Apr 19, 2022

Posting this up so that folks can review what I have so far before I go too deep into manually fixing the 68 properties

Ignore the individual commits, they're just how I did the change incrementally. I recommend reviewing the final state of provider.rs, maps.rs, and sets.rs

Basically, this introduces:

  • CodePointSet, CodePointMap: two serializable data types for storing code point data
  • CodePointSetLikeProperty, CodePointMapLikeProperty: traits for properties that are setlike/maplike
  • CodePointSetData, CodePointMapData: Wrappers around DataPayload that expose appropriate functions

Each property gets a marker type like WordBreakProperty. I decided to not call them Marker to avoid confusion.

I've used a macro to make it easier to define these.

This does not touch the uniset or CPT crates; they can be renamed however we want them to be in the future. Note that right now this crate introduces a CodePointSet and CodePointMap data type that contains UnicodeSet and CodePointTrie respectively, if these names end up clashing in the future we can either unify the types or pick different names. I don't want to spend too much time making that call in this PR since it will be easy to make later and this PR is going to be changing a lot already.

This also doesn't version the names yet. I'm not 100% sure what we should version here; and this is also something that can be done separately.

(I'm trying to reduce the scope of this PR to just be complex typesystem stuff)

One major change here is that previously all setlike properties had the same return value; and now they are all separate types with marker types. This has the benefit of being ResourceMarker-compatible, however is this actually what is desired?

If this design looks good I'm going to go ahead and update the rest of the sets to use it (and update tests, etc)

thoughts? @echeran @sffc

}

/// Properties that are stored in CodePointSets
pub trait CodePointSetLikeProperty {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n.b. this and the other trait should just be CodePointSetProperty

CPSLike/etc haven't been introduced yet

#[cfg_attr(feature = "datagen", derive(serde::Serialize))]
#[cfg_attr(feature = "serde", derive(serde::Deserialize))]
pub struct UnicodePropertyV1<'data> {
pub struct CodePointSet<'data> {
/// The set of characters, represented as an inversion list
#[cfg_attr(feature = "serde", serde(borrow))]
pub inv_list: UnicodeSet<'data>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If/when you introduce a trait CodePointSetLike to represent the behavior of the CodePointSet data struct, would you be able to use that here somehow in the type of the field? Both the field name and its current type suggest a singular concrete implementation type.

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • components/properties/src/sets.rs is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • components/properties/src/bidi.rs is now changed in the branch
  • components/properties/src/lib.rs is now changed in the branch
  • components/properties/src/maps.rs is different
  • components/properties/src/props.rs is now changed in the branch
  • components/properties/src/provider.rs is no longer changed in the branch
  • components/properties/src/sets.rs is different
  • experimental/collator/src/comparison.rs is now changed in the branch
  • experimental/collator/src/elements.rs is now changed in the branch
  • experimental/normalizer/src/lib.rs is now changed in the branch
  • ffi/diplomat/src/bidi.rs is now changed in the branch
  • ffi/diplomat/src/properties_maps.rs is now changed in the branch
  • ffi/diplomat/src/properties_sets.rs is now changed in the branch
  • provider/datagen/src/transform/cldr/list/mod.rs is now changed in the branch
  • provider/datagen/src/transform/segmenter/mod.rs is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@Manishearth
Copy link
Member Author

oops

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants